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Abstract 


This document describes an RTP payload format for efficient and 
flexible transporting of audio data encoded with the Adaptive 


TRansform Audio Coding (ATRAC) family of codecs. Recent enhancements 
to the ATRAC family of codecs support high-quality audio coding with 
multiple channels. The RTP payload format as presented in this 


document also includes support for data fragmentation, elementary 
redundancy measures, and a variation on scalable streaming. 


Status of This Memo 


This document specifies an Internet standards track protocol for the 
Internet community, and requests discussion and suggestions for 


improvements. Please refer to the current edition of the "Internet 
Official Protocol Standards" (STD 1) for the standardization state 
and status of this protocol. Distribution of this memo is unlimited. 
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Provisions Relating to IETF Documents in effect on the date of 
publication of this document (http://trustee.ietf.org/license-info). 
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10, 2008. The person(s) controlling the copyright in some of this 
material may not have granted the IETF Trust the right to allow 
modifications of such material outside the IETF Standards Process. 
Without obtaining an adequate license from the person(s) controlling 
the copyright in such materials, this document may not be modified 
outside the IETF Standards Process, and derivative works of it may 
not be created outside the IETF Standards Process, except to format 
it for publication as an RFC or to translate it into languages other 
than English. 
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Les 


Introduction 


The ATRAC family of perceptual audio codecs is designed to address 
numerous needs for high-quality, low-bit-rate audio transfer. ATRAC 
technology can be found in many consumer and professional products 
and applications, including MD players, CD players, voice recorders, 
and mobile phones. 


Recent advances in ATRAC technology allow for multiple channels of 
audio to be encoded in customizable groupings. This should allow for 
future expansions in scaled streaming to provide the greatest 
flexibility in streaming any one of the ATRAC family member codecs; 
however, this payload format does not distinguish between the codecs 
on a packet level. 


This simplified payload format contains only the basic information 
needed to disassemble a packet of ATRAC audio in order to decode it. 
There is also basic support for fragmentation and redundancy. 


Conventions Used in This Document 


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
document are to be interpreted as described in RFC 2119 [4]. 


Codec-Specific Details 


Early versions of the ATRAC codec handled only two channels of audio 
at 44.1 kHz sampling frequency, with typical bit-rates between 66 
kbps and 132 kbps. The latest version allows for a maximum of 8 
channels of audio, up to 96 kHz in sampling frequency, and a lossless 
encoding option that can be transmitted in either a scalable (also 
known as High-Speed Transfer mode) or standard (aka Standard mode) 
format. The feasible bit-rate range has also expanded, allowing from 
a low of 8 kbps up to 1400 kbps in lossy encoding modes. 


Depending on the version of ATRAC used, the sample-frame size is 
either 512, 1024, or 2048 samples. While the lossy and Standard mode 
lossless formats are encoded as sequential single audio frames, 
High-Speed Transfer mode lossless data comprises two layers -- a 
lossy base layer and an enhancement layer. 


Although streaming of multi-channel audio is supported depending on 
the ATRAC version used, all encoded audio for a given time period is 
contained within a single frame. Therefore, there is no interleaving 
nor splitting of audio data on a per-channel basis with which to be 
concerned. 
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4. RTP Packetization and Transport of ATRAC-Family Streams 
4.1. ATRAC Frames 


For transportation of compressed audio data, ATRAC uses the concept 
of frames. ATRAC frames are the smallest data unit for which timing 
information is attributed. Frames are octet-aligned by definition. 


4.2. Concatenation of Frames 


It is often possible to carry multiple frames in one RTP packet. 
This can be useful in audio, where on a LAN with a 1500-byte MTU, an 
average of 7 complete 64 kbps ATRAC frames could be carried ina 
single RTP packet, as each ATRAC frame would be approximately 200 
bytes. ATRAC frames may be of fixed or variable length. To 
facilitate parsing in the case of multiple frames in one RTP packet, 
the size of each frame is made known to the receiver by carrying 
"in-band" the frame size for each contained frame in an RTP packet. 
However, to simplify the implementation of RTP receivers, it is 
required that when multiple frames are carried in an RTP packet, each 
frame MUST be complete, i.e., the number of frames in an RTP packet 
MUST be integral. 


4.3. Frame Fragmentation 


The ATRAC codec can handle very large frames. As most IP networks 
have significantly smaller MTU sizes than the frame sizes ATRAC can 
handle, this payload format allows for the fragmentation of an ATRAC 
frame over multiple RTP packets. However, to simplify the 
implementation of RTP receivers, an RTP packet MUST carry either one 
or more complete ATRAC frames or a single fragment of one ATRAC 
frame. In other words, RTP packets MUST NOT contain fragments of 
multiple ATRAC frames and MUST NOT contain a mix of complete and 
fragmented frames. 


4.4. Transmission of Redundant Frames 


As RTP does not guarantee reliable transmission, receipt of data is 
not assured. Loss of a packet can result in a "decoding gap" at the 
receiver. One method to remedy this problem is to allow time-shifted 
copies of ATRAC frames to be sent along with current data. Fora 
modest cost in latency and implementation complexity, error 
resiliency to packet loss can be achieved. For further details, see 
Section 5.3.2.1 and [12]. 
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4.5. Scalable Lossless Streaming (High-Speed Transfer Mode) 


As ATRAC supports a variation on scalable encoding, this payload 
format provides a mechanism for transmitting essential data (also 
referred to as the base layer) with its enhancement data in two ways 
-- multiplexed through one session or separated over two sessions. 


In either method, only the base layer is essential in producing audio 
data. The enhancement layer carries the remaining audio data needed 
to decode lossless audio data. So in situations of limited 
bandwidth, the sender may choose not to transmit enhancement data yet 
still provide a client with enough data to generate lossily-encoded 
audio through the base layer. 


4.5.1. Scalable Multiplexed Streaming 


In multiplexed streaming, the base layer and enhancement layer are 
coupled together in each packet, utilizing only one session as 
illustrated in Figure 1. 


The packet MUST begin with the base layer, and the two layer types 
MUST interleave if both of the layers exist in a packet (only base or 
enhancement is included in a packet at the beginning of a streaming, 
or during the fragmentation). 


N N+1 N+2 : Packet 
Figure 1. Multiplexed Structure 
4.5.2. Scalable Multi-Session Streaming 


In multi-session streaming, the base layer and enhancement layer are 
sent over two separate sessions, allowing clients with certain 
bandwidth limitations to receive just the base layer for decoding as 
illustrated in Figure 2. 


In this case, it is REQUIRED to determine which sessions are paired 
together in receiver side. For paired base and enhancement layer 
sessions, the CNAME bindings in the RTP Control Protocol (RTCP) 
session MUST be applied using the same CNAME to ensure correct 
mapping to the RTP source. 
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While there may be alternative methods for synchronization of the 
layers, the timestamp SHOULD be used for synchronizing the base layer 
with its enhancement. The two sessions MUST be synchronized using 
the information in RTCP SR packets to align the RTP timestamps. 

If the enhancement layer’s session data cannot arrive until the 
presentation time, the decoder MUST decode the base layer session’s 
data only, ignoring the enhancement layer’s data. 


Session 1: 


N N+1 N+2 N+3 : Packet 


N N+1 N+2 : Packet 
Figure 2. Multi-Session Streaming 
5. Payload Format 
5.1. Global Structure of Payload Format 


The structure of ATRAC Payload is illustrated in Figure 3. The RTP 
payload following the RTP header contains two octet-aligned data 


sections. 
+= $ A ea eee ee + 
|RTP | ATRAC Header | ATRAC Frames Section | 
|Header| Section | (including redundant data) 
+= $ AZ O + 
E SARTRE RTP -Packet ¿Payillo3d:=-==55====2322=> > 


Figure 3. Structure of RTP Payload of ATRAC Family 


The first data section is the ATRAC Header, containing just one 
header with information for the whole packet. The second section is 
where the encoded ATRAC frames are stored. This may contain either a 
single fragment of one ATRAC frame or one or more complete ATRAC 
frames. The ATRAC Frames Section MUST NOT be empty. When using the 
redundancy mechanism described in Section 5.3.2.1, the redundant 
frame data can be included in this section and timestamp MUST be set 
to the oldest redundant frame's timestamp. 
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To benefit from ATRAC’s High-Speed Transfer mode lossless encoding 
capability, the RTP payload can be split across two sessions, with 
one transmitting an essential base layer and the other transmitting 
enhancement data. However, in either case, the above structure still 
applies. 


5.2. Usage of RTP Header Fields 


0 1 2 3 
Oleh, Ze 3 Ae DTE 8 ODIA IE 9 OL a A 6 18.9 OF. 
dh A A O O A e O dh o o o o o +++ 
v=2|P|x| cc |m] PT | sequence number 
dh A A A A O O A e e O ho o o o o +++ 


+ 

+ 

| timestamp | 
FPP tata tata tartar a o o E O 
| synchronization source (SSRC) identifier 

Fatt tata tata tar tata a o o o E O 
| contributing source (CSRC) identifiers 

+ 


hd A dh dd + $ + — + + $ + 4 q ++ +++ +++ +4 +++ 
Figure 4. RTP Standard Header Part 


The structure of the RTP Standard Header Part is illustrated in 
Figure 4. 


Version(V): 2 bits 
Set to 2. 


Padding(P): 1 bit 

If the padding bit is set, the packet contains one or more additional 
padding octets at the end, which are not part of the payload. The 
last octet of the padding contains a count of how many padding octets 
should be ignored, including itself. Padding may be needed by some 
encryption algorithms with fixed block sizes or for carrying several 
RTP packets in a lower-layer protocol data unit (see [1]). 


Extension(X): 1 bit 
Defined by the RTP profile used. 


CSRC count(CC): 4 bits 
See RFC 3550 [1]. 


Marker (M): 1 bit 


Set to 1 if the packet is the first packet after a silence period; 
otherwise, it MUST be set to 0. 
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Payload Type (PT): 7 bits 

The assignment of an RTP payload type for this packet format is 
outside the scope of this document; it is specified by the RTP 
profile under which this payload format is used, or signaled 
dynamically out-of-band (e.g., using the Session Description Protocol 
(SDP)). 


sequence number: 16 bits 
A sequential number for the RTP packet. It ranges from 0 to 65535 
and repeats itself periodically. 


Timestamp: 32 bits 

A timestamp representing the sampling time of the first sample of the 
first ATRAC frame in the current RTP packet. 

When using SDP, the clock rate of the RTP timestamp MUST be expressed 
using the "rtpmap" attribute. For ATRAC3 and ATRAC Advanced 
Lossless, the RTP timestamp rate MUST be 44100 Hz. For ATRAC-X, the 
RTP timestamp rate is 44100 Hz or 48000 Hz, and it will be selected 
by out-of-band signaling. 


SSRC: 32 bits 
See RFC 3550 [1]. 


CSRC list: 0 to 15 items, 32 bits each 
See RFC 3550 [1]. 


5.3. RIP Payload Structure 
5.3.1. Usage of ATRAC Header Section 


The ATRAC header section has the fixed length of one byte as 
illustrated in Figure 5. 


01234567 
Ho ho ho ho o o + ++ 
|C|FrgNo|NFrames | 
dh ho ho oo +++ 


Figure 5. ATRAC RTP Header 


Continuation Flag (C) : 1 bit 

The packet that corresponds to the last part of the audio frame data 
in a fragmentation MUST have this bit set to 0; otherwise, it's set 
to 1. 


Fragment Number (FrgNo): 3 bits 


In the event of data fragmentation, this value is one for the first 
packet, and increases sequentially for the remaining fragmented data 
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packets. This value MUST be zero for an unfragmented frame. (Note: 
3 bits is sufficient to avoid Fragment Number rollover given the 
current maximum supported bit-rate in the ATRAC specification. If 
that changes, the choice of 3 bits for the Fragment Number should be 
revisited.) 

Number of Frames (NFrames): 4 bits 

The number of audio frames in this packet are field value + 1. This 


allows for a maximum of 16 ATRAC-encoded audio frames per packet, 
with 0 indicating one audio frame. Each audio frame MUST be complete 
in the packet if fragmentation is not applied. In the case of 
fragmentation, the data for only one audio frame is allowed to be 
fragmented, and this value MUST be 0. 


5.3.2. Usage of ATRAC Frames Section 


The ATRAC Frames Section contains an integer number of complete ATRAC 
frames or a single fragment of one ATRAC frame, as illustrated in 
Figure 6. Each ATRAC frame is preceded by a one-bit flag indicating 
the layer type and a Block Length field indicating the size in bytes 
of the ATRAC frame. If more than one ATRAC frame is present, then 
the frames are concatenated into a contiguous string of bit-flag, 
Block Length, and ATRAC frame in order of their frame number. This 
section MUST NOT be empty. 


Oo dy 2-3) 4-5: a B89 OL 23 4 B26. e 19) .0. 1) 2-3) 4, 6. 78 9-10: 1 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
|E] Block Length | ATRAC frame E 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 


Figure 6. ATRAC Frame Section Format 


Layer Type Flag (E): 1 bit 
Set to 1 if the corresponding ATRAC frame is from an enhancement 
layer. 0 indicates a base layer encoded frame. 


Block length: 15 bits 

The byte length of encoded audio data for the following frame. This 
is so that in the case of fragmentation, if only a subsequent packet 
is received, decoding can still occur. 15 bits allows for a maximum 
block length of 32,767 bytes. 


ATRAC frame: The encoded ATRAC audio data. 
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5.3.2.1. Support of Redundancy 


This payload format provides a rudimentary scheme to compensate for 
occasional packet loss. As every packet’s timestamp corresponds to 
the first audio frame regardless of whether or not it is redundant, 
and because we know how many frames of audio each packet 
encapsulates, if two successive packets are successfully transmitted, 
we can calculate the number of redundant frames being sent. The 
result gives the client a sense of how the server is responding to 
RTCP reports and warns it to expand its buffer size if necessary. As 
an example of using the Redundant Data, refer to Figures 7 and 8. 


In this example, the server has determined that for the next few 
packets, it should send the last two frames from the previous packet 
due to recent RICP reports. Thus, between packets N and N+1, there 
is a redundancy of two frames (of which the client may choose to 
dispose). The benefit arises when packets N+2 and N+3 do not arrive 
at all, after which eventually packet N+4 arrives with successive 
necessary audio frame data. 


[Sender] 
| -Fro-|-Fr1-|-Fr2-| Packet: N,  TS=0 
| -Fr1-|-Fr2-|-Fr3-| Packet: N+1, TS=1024 
| -Fr2-|-Fr3-|-Fr4-| Packet: N+2, TS=2048 
|-Fr3-|-Fr4-|-Fr5-| Packet: N+3, TS=3072 
|-Fr4-|-Fr5-|-Fr6-| Packet: N+4, TS=4096 
EIA SS > Packet "N+2" and "N+3" not arrived -------------> 
[Receiver] 
|-Fro-|-Fri-|-Fr2-| Packet: N,  TS=0 
| -Fr1-|-Fr2-|-Fr3- Packet: N+1, TS=1024 


-Fr4-|-Fr5-|-Fr6é-| Packet: N+4, TS=4096 


The receiver can decode from FR4 to Fr6 by using Packet "N+4" data 
even if the packet loss of "N+2" and "N+3" has occurred. 


Figure 7. Redundant Example 
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0 1. 2 3 
0:1..2-34, 96, Y 89:01:23 4 56:44 8: 09101, :2 34-506: 7-8 9:01 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 


|v=2|P|x| cc |m] PT | sequence number 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| timestamp (= start sample time of Frl) 


dh + ho hd e hd + + + + + + ++ + + +++ + +++ ++ +++ 
synchronization source (SSRC) identifier | 

tata tata O + hd hd e tata tata ta + + ++ +++ + +++ +++ 
contributing source (CSRC) identifiers | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +H 


+-+- 
| Block Length 


=+- 
3 0 | 
A O O A sh hh hh o ho o o +++ 
(redundant) ATRAC frame (Fr1) data 
A O A O O A O hh hh O hh + o + +++ 
o| Block Length | (redundant) ATRAC frame (Fr2) 
A O O A O O A o hh hh hh + o o + +++ 
(cont.) |0| Block Length | ATRAC frame (Fr3) 
A A O O O A hh hh hhh o o o +++ 
(cont .) | 
A O A O O A O hh hh hh + o + +++ 


Figure 8. Packet Structure Example with Redundant Data 
(Case of Packet "N+1") 


5.3.2.2. Frame Fragmentation 


Each RTP packet MUST contain either an integer number of ATRAC- 
encoded audio frames (with a maximum of 16) or one ATRAC frame 
fragment. In the former case, as many complete ATRAC frames as can 
fit in a single path-MTU SHOULD be placed in an RTP packet. However, 
if even a single ATRAC frame will not fit into a complete RTP packet, 
the ATRAC frame MUST be fragmented. 


The start of a fragmented frame gets placed in its own RTP packet 
with its Continuation bit (C) set to one, and its Fragment Number 
(FragNo) set to one. As the frame must be the only one in the 
packet, the Number of Frames field is zero. Subsequent packets are 
to contain the remaining fragmented frame data, with the Fragment 
Number increasing sequentially and the Continuation bit (C) 
consistently set to one. As subsequent packets do not contain any 
new frames, the Number of Frames field MUST be ignored. The last 
packet of fragmented data MUST have the Continuation bit (C) set to 
zero. 
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Packets containing related fragmented frames MUST have identical 


timestamps. 


Thus, 


while the Continuous bit and Fragment Number 
fields indicate fragmentation and a means to reorder the packets, 


the 


timestamp can be used to determine which packets go together. 


6. Packetization Examples 


6.1. 


Multiple encoded audio frames are combined into one packet. 
only base layer frames are sent redundantly, 


how, 


for this example, 


Example Multi-Frame Packet 


Note 


but are followed by interleaved base layer and enhancement layer 
frames as illustrated in Figure 9. 


0 


1 2 


3 


042039 4-9:50-1 899 (OP 2 34996178 (90 Oo 2 3 Aa 671078: 9 “Oral 
E 


|v=2 |» |x| 


|m] PT | sequence number 


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++ 


timestamp 


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++ 


synchronization source (SSRC) identifier 


+-4+-4-4-4-4-4-4-4-4+-4-4-4-4-4-4-4+-4-4+-4+-4-4+-4+-4-4-4+-4-4-4-4+-4-4-4 


| contributing source (CSRC) identifiers 

O A tet an AAA! 
pop o | 5 |o] Block Length 

FPP rt ata tartar tartar a a o o E O 
| (redundant) base layer frame 1 data... 

Fatt rt rt tata tata a o o o o O 
|o] Block Length | (redundant) base layer frame 2 | 
FPP rt ata tata tartar tartan ta tata tata tt tat ata HMHHMHHMHHMHH 
| (cont. | 0 | Block Length | base layer frame 3 | 
Fatt ata ta tata tartar a o o o o O 
| (cont.) | Block Length | enhancement frame 3 | 
a a a o o o E E 
| (cont.) | Block Length | base layer frame 4 | 
Fatt tata tata tartar a a o o E O 
| (cont.) | Block Length | enhancement frame 4 | 
a a a tartan tanta tat t ttt tata t atta tatatatatatatat 
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6.2. Example Fragmented ATRAC Frame 


The encoded audio data frame is split over three RTP packets as 
illustrated in Figure 10. The following points are highlighted in 
the example below: 


o transition from one to zero of the Continuation bit (C) 
o sequential increase in the Fragment Number 


Packet 1: 

0 1 2 3 

0: 1.234: 56: 78:90. T23 A S 6 7 8D O12 3 A 8 6. 8,0901 
a e A a A A N A a e A A a E A 
|v=2|P|x| cc |m] PT | sequence number 
a a a a a A a a tata E a A E A tat 
| timestamp | 
PHRF RHF rt tartar tanta a tata ta ta tat E E 
| synchronization source (SSRC) identifier 
Pat a a S E A A a a A a E r A A E E a E E 
contributing source (CSRC) identifiers | 


e =+- -+ A L ee A 
1 | |1] Block Length 
++ +++ a a o o o E E E E E E 
enhancement data... 
a tartar tata tata tata tata tata tata tata tata ta tata tat 


+-+-+-+-+ 
1| 
+ 


0 
=+- — 


| 
| 
+ 
| 
+ 
| 
+ 


Packet 2: 

0 1 2 3 
01234567890123456789012345678901 
Pat HHH a tata ta tata tata tata tata tata tata tata tata tatatatatat 

|v=2|P|x| cc |m] PT | sequence number 

Pat RHF atta tata ta E E E 
| timestamp | 
TORS A a a n A a a a a tata tata tata tata a a A E A 
| synchronization source (SSRC) identifier 

s a a a a A A A a A a E E E 
contributing source (CSRC) identifiers | 


= =+- = E i EE T E O E. 
2 | |1] Block Length 
-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
...more enhancement data... 
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 


+-+-+-+-+ - 
1| 
+ 


+ 


| 
| 
+ =+- 
| 0 
+ =+- - 
| 

+ 
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1 2 3 


01.223,40 9,0 1 :8::9:0 ALAS 60189 02 3 Al 6° 78. 9:01 


++ 
v=2|P|x| cc |m] 
++ 


l 
+ 
l 
+ 
| 
+ 
l 
+ 


+ 
of 3 | o [1] 
+ 


.. the 


eg ee et 
+ 
l 
+ 
l 
+ 
l 
+ 


-+-+-+-+-+-+-+-+-+- 


-+-+-+-+-+-+- 
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Figure 10. Example Fragmented ATRAC Frame 


Payload Format Parameters 


Certain parameters will need to be defined before ATRAC-family- 


encoded content can 


be streamed. Other optional parameters may also 


be defined to take advantage of specific features relevant to certain 
ATRAC versions. Parameters for ATRAC3, ATRAC-X, and ATRAC Advanced 
Lossless are defined here as part of the media subtype registration 


process. A mapping 
Protocol (SDP) (RFC 
utilize SDP. These 
4288 [5] and follow 


The data format and 
in RTP. 


of these parameters into the Session Description 
4566) [2] is also provided for applications that 
registrations use the template defined in RFC 
RFC 4855 [6]. 


parameters are specified for real-time transport 


ATRAC3 Media Type Registration 


The media subtype for the Adaptive TRansform Codec version 3 (ATRAC3) 
uses the template defined in RFC 4855 [6]. 


Note, any unknown parameter MUST be ignored by the receiver. 


Type name: audio 


Subtype name: ATRAC3 
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Required parameters: 


rate: Represents the sampling frequency in Hz of the original audio 
data. Permissible value is 44100 only. 


baseLayer: Indicates the encoded bit-rate in kbps for the audio data 
to be streamed. Permissible values are 66, 105, and 132. 


Optional parameters: 

ptime: See RFC 4566 [2]. 

maxptime: See RFC 4566 [2]. 

The frame length of ATRAC3 is 1024/44100 = 23.22...(ms), and 


fractional value may not be applicable for the SDP definition. 


So the value of the parameter MUST be a multiple of 24 (ms) 
considering safe transmission. 


If this parameter is not present, the sender MAY encapsulate a 
maximum of 6 encoded frames into one RTP packet, in streaming of 
ATRAC3. 


maxRedundantFrames: The maximum number of redundant frames that may 
be sent during a session in any given packet under the redundant 
framing mechanism detailed in the document. Allowed values are 
integers in the range of 0 to 15, inclusive. If this parameter is 
not used, a default of 15 MUST be assumed. 


Encoding considerations: This media type is framed and contains 
binary data. 


Security considerations: This media type does not carry active 
content. See Section 9 of this document. 


Interoperability considerations: none 
Published specification: ATRAC3 Standard Specification [9] 


Applications that use this media type: 
Audio and video streaming and conferencing tools. 


Additional information: none 


Magic number(s): none 
File extension(s): ‘’at3’, 'aa3', and ’omg’ 
Macintosh file type code(s): none 
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Person and email address to contact for further information: 
Mitsuyuki Hatanaka 

Jun Matsumoto 

actech@jp.sony.com 

Intended usage: COMMON 


Restrictions on usage: This media type depends on RTP framing, and 
hence is only defined for transfer via RTP. 


Author: 

Mitsuyuki Hatanaka 

Jun Matsumoto 

actech@jp.sony.com 

Change controller: IETF AVT WG delegated from the IESG 
7.2.  ATRAC-X Media Type Registration 


The media subtype for the Adaptive TRansform Codec version X 
(ATRAC-X) uses the template defined in RFC 4855 [6]. 


Note, any unknown parameter MUST be ignored by the receiver. 
Type name: audio 

Subtype name: ATRAC-X 

Required parameters: 


rate: Represents the sampling frequency in Hz of the original 
audio data. Permissible values are 44100 and 48000. 


baseLayer: Indicates the encoded bit-rate in kbps for the audio data 
to be streamed. Permissible values are 32, 48, 64, 96, 128, 160, 
192, 256, 320, and 352. 


channelID: Indicates the number of channels and channel layout 
according to the tablel in Section 7.4. Note that this layout is 
different from that proposed in RFC 3551 [3]. However, as channelID 


= 0 defines an ambiguous channel layout, the channel mapping defined 
in Section 4.1 of [3] could be used. Permissible values are 0, 1, 2, 
Y Ue Dy Gy Ta 


Optional parameters: 


ptime: See RFC 4566 [2]. 
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maxptime: See RFC 4566 [2]. 

The frame length of ATRAC-X is 2048/44100 = 46.44...(ms) or 
2048/48000 = 42.67...(ms), but fractional value may not be applicable 
for the SDP definition. So the value of the parameter MUST be a 
multiple of 47 (ms) or 43 (ms) considering safe transmission. 


If this parameter is not present, the sender MAY encapsulate a 
maximum of 16 encoded frames into one RTP packet, in streaming of 
ATRAC-X. 


maxRedundantFrames: The maximum number of redundant frames that may 
be sent during a session in any given packet under the redundant 
framing mechanism detailed in the document. Allowed values are 
integers in the range 0 to 15, inclusive. If this parameter is not 
used, a default of 15 MUST be assumed. 


delayMode: Indicates a desire to use low-delay features, in which 
case the decoder will process received data accordingly based on this 


value. Permissible values are 2 and 4. 


Encoding considerations: This media type is framed and contains 
binary data. 


Security considerations: This media type does not carry active 
content. See Section 9 of this document. 


Interoperability considerations: none 
Published specification: ATRAC-X Standard Specification [10] 


Applications that use this media type: 
Audio and video streaming and conferencing tools. 


Additional information: none 

Magic number(s): none 

File extension(s): ‘atx’, 'aa3', and ’omg’ 

Macintosh file type code(s): none 

Person and email address to contact for further information: 
Mitsuyuki Hatanaka 

Jun Matsumoto 

actech@jp.sony.com 


Intended usage: COMMON 


Restrictions on usage: This media type depends on RTP framing, and 
hence is only defined for transfer via RTP. 
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Author: 

Mitsuyuki Hatanaka 
Jun Matsumoto 
actech@jp.sony.com 


Change controller: IETF AVT WG delegated from the IESG 
7.3. ATRAC Advanced Lossless Media Type Registration 


The media subtype for the Adaptive TRansform Codec Lossless version 
(ATRAC Advanced Lossless) uses the template defined in RFC 4855 [6]. 


Note, any unknown parameter MUST be ignored by the receiver. 

Type name: audio 

Subtype name: ATRAC-ADVANCED-LOSSLESS 

Required parameters: 

rate: Represents the sampling frequency in Hz of the original 

audio data. Permissible value is 44100 only for High-Speed Transfer 
mode. Any value of 24000, 32000, 44100, 48000, 64000, 88200, 96000, 
176400, and 192000 can be used for Standard mode. 


baseLayer: Indicates the encoded bit-rate in kbps for the base layer 
in High-Speed Transfer mode lossless encodings. 


For Standard lossless mode, this value MUST be 0. 

The Permissible values for ATRAC3 baselayer are 66, 105, and 132. 

For ATRAC-X baselayer, they are 32, 48, 64, 96, 128, 160, 192, 256, 
320, and 352. 

blockLength: Indicates the block length. In High-Speed Transfer 
mode, the value of 1024 and 2048 is used for ATRAC3 based and ATRAC-X 


based ATRAC Advanced Lossless streaming, respectively. 


Any value of 512, 1024, and 2048 can be used for Standard mode. 


channelID: Indicates the number of channels and channel layout 
according to the tablel in Section 7.4. Note that this layout is 
different from that proposed in RFC 3551 [3]. However, as channelID 


= 0 defines an ambiguous channel layout, the channel mapping defined 
in Section 4.1 of [3] could be used in this case. Permissible values 
are 0; 1; 2,..3, 4, 5, 6, Te 


ptime: See RFC 4566 [2]. 
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maxptime: See RFC 4566 [2]. 

In streaming of ATRAC Advanced Lossless, multiple frames cannot be 
transmitted in a single RTP packet, as the frame size is large. So 
it SHOULD be regarded as the time of one encoded frame in both of the 
sender and the receiver side. The frame length of ATRAC Advanced 
Lossless is 512/44100 = 11.6...(ms), 1024/44100 = 23.22...(ms), or 
2048/44100 = 46.44...(ms), but fractional value may not be applicable 
for the SDP definition. So the value of the parameter MUST be 
12(ms), 24(ms), or 47(ms) considering safe transmission. 


Encoding considerations: This media type is framed and contains 
binary data. 


Security considerations: This media type does not carry active 
content. See Section 9 of this document. 


Interoperability considerations: none 


Published specification: 
ATRAC Advanced Lossless Standard Specification [11] 


Applications that use this media type: 
Audio and video streaming and conferencing tools. 


Additional information: none 

Magic number(s): none 

File extension(s): ‘’aal’, 'aa3', and ’omg’ 

Macintosh file type code(s): none 

Person and email address to contact for further information: 
Mitsuyuki Hatanaka 

Jun Matsumoto 

actech@jp.sony.com 


Intended usage: COMMON 


Restrictions on usage: This media type depends on RTP framing, and 
hence is only defined for transfer via RTP. 


Author: 

Mitsuyuki Hatanaka 
Jun Matsumoto 
actech@jp.sony.com 


Change controller: IETF AVT WG delegated from the IESG 
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7.4. Channel Mapping Configuration Table 


RTP Payload Format for ATRAC Family 


July 2009 


Table 1 explains the mapping between the channelID as passed during 
SDP negotiations, and the speaker mapping the value represents. 


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| channelID | Number of | Default Speaker | 
| | Channels | Mapping | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| 0 | max 64 | undefined 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| 1 | Í | front: center 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| 2 | 2 | front: left, right | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
3 3 front: left, right 
front: center 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| 4 | 4 | front: left, right | 
| | | front: center | 
| | | rear: surround | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
| 5 | 5+1 | front: left, right | 
| | | front: center | 
| | | rear: left, right | 
| | | LFE | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
6 6+1 front: left, right 
front: center 
| | | rear: left, right | 
| | | rear: center | 
| | | IFE | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
7 7+1 front: left, right 
front: center 
| | | rear: left, right | 
| | | side: left, right | 
| | | IFE | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 


Table 1. 
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5. 


.5 


Mapping Media Type Parameters into SDP 


The information carried in the Media type specification has a 
specific mapping to fields in the Session Description Protocol (SDP) 
[2], which is commonly used to describe RTP sessions. When SDP is 
used to specify sessions employing the ATRAC family of codecs, the 
following mapping rules according to the ATRAC codec apply. 


La 


Ta 


For Media Subtype ATRAC3 
The Media type ("audio") goes in SDP "m=" as the media name. 


The Media subtype (payload format name) goes in SDP "a=rtpmap" as 
the encoding name. ATRAC3 supports only mono or stereo signals, 
so a corresponding number of channels (0 or 1) MUST also be 
specified in this attribute. 


The "baseLayer" parameter goes in SDP "a=fmtp". This parameter 
MUST be present. "maxRedundantFrames" may follow, but if no value 
is transmitted, the receiver SHOULD assume a default value of 

75 . 


The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 
"a=maxptime" attributes, respectively. 


For Media Subtype ATRAC-X 
The Media type ("audio") goes in SDP "m=" as the media name. 
The Media subtype (payload format name) goes in SDP "a=rtpmap" as 
the encoding name. This SHOULD be followed by the "sampleRate" 
(as the RTP clock rate), and then the actual number of channels 


regardless of the channelID parameter. 


The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 
"a=maxptime" attributes, respectively. 


Any remaining parameters go in the SDP "a=fmtp" attribute by 
copying them directly from the Media type string as a semicolon- 
separated list of parameter=value pairs. The "baseLayer" 
parameter MUST be the first entry on this line. The "channelID" 
parameter MUST be the next entry. The receiver MUST assume a 
default value of "15" for "maxRedundantFrames". 
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For Media Subtype ATRAC Advanced Lossless 
The Media type ("audio") goes in SDP "m=" as the media name. 


The Media subtype (payload format name) goes in SDP "a=rtpmap" as 
the encoding name. This MUST be followed by the "sampleRate" (as 
the RTP clock rate), and then the actual number of channels 
regardless of the channelID parameter. 


The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 
"a=maxptime" attributes, respectively. 


Any remaining parameters go in the SDP "a=fmtp" attribute by 
copying them directly from the Media type string as a semicolon- 
separated list of parameter=value pairs. 


On this line, the parameters "baseLayer" and "blockLength" MUST be 
present in this order. 


The value of "blockLength" MUST be one of 1024 and 2048, for using 
ATRAC3 and ATRAC-X as baselayer, respectively. If "baseLayer=0" 
(means standard mode), "blockLength" MUST be one of either 512, 
1024, or 2048. The "channelID" parameter MUST be the next entry 
The receiver MUST assume a default value of "15" for 
"maxRedundantFrames". 


Offer/Answer Model Considerations 


Some options for encoding and decoding ATRAC audio data will require 
either or both of the sender and receiver complying with certain 
specifications. In order to establish an interoperable transmission 
framework, an Offer/Answer negotiation in SDP MUST observe the 
following considerations. (See [14].) 


T 


For All Three Media Subtypes 


Each combination of the RTP payload transport format configuration 
parameters (baselayer and blockLength, sampleRate, channelID) is 
unique in its bit-pattern and not compatible with any other 
combination. When creating an offer in an application desiring to 
use the more advanced features (sample rates above 44100 kHz, more 
than two channels), the offerer SHOULD also offer a payload type 
containing only the lowest set of necessary requirements. If 
multiple configurations are of interest to the application, they 
may all be offered. 
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A:s 


The parameters "maxptime" and "ptime" will in most cases not 
affect interoperability; however, the setting of the parameters 
can affect the performance of the application. The SDP 
Offer/Answer handling of the "ptime" parameter is described in RFC 
3264. The "maxptime" parameter MUST be handled in the same way. 


For Media Subtype ATRAC3 
In response to an offer, downgraded subsets of "baselayer" are 


possible. However, for best performance, we suggest the answer 
contain the highest possible values offered. 


For Media Subtype ATRAC-X 


In response to an offer, downgraded subsets of "sampleRate", 
"baselayer", and "ChannelID" are possible. For best performance, 
an answer MUST NOT contain any values requiring further 
capabilities than the offer contains, but it SHOULD provide values 
as close as possible to those in the offer. 


The "maxRedundantFrames" is a suggested minimum. This value MAY 
be increased in an answer (with a maximum of 15), but MUST NOT be 
reduced. 


The optional parameter "delayMode" is non-negotiable. If the 
Answerer cannot comply with the offered value, the session MUST be 
deemed inoperable. 


For Media Subtype ATRAC Advanced Lossless 


In response to an offer, downgraded subsets of "sampleRate", 
"baselayer", and "ChannelID" are possible. For best performance, 
an answer MUST NOT contain any values requiring further 
capabilities than the offer contains, but it SHOULD provide values 
as close as possible to those in the offer. 


There are no requirements when negotiating "blockLength", other 
than that both parties must be in agreement. 


The "maxRedundantFrames" is a suggested minimum. This value MAY 
be increased in an answer (with a maximum of 15), but MUST NOT be 
reduced. 


For transmission of scalable multi-session streaming of ATRAC 
Advanced Lossless content, the attributes of media stream 
identification, group information, and decoding dependency between 
base layer stream and enhancement layer stream MUST be signaled in 
SDP by the Offer/Answer model. In this case, the attribute of 
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"group", "mid", and "depend" followed by the appropriate parameter 
MUST be used in SDP [7] [8] in order to indicate layered coding 
dependency. The attribute of "group" followed by "DDP" parameter 
is used for indicating the relationship between the base and the 
enhancement layer stream with decoding dependency. Each stream is 
identified by "mid" attribute, and the dependency of enhancement 
layer stream is defined by the "depend" attribute, as the 
enhancement layer is only useful when the base layer is available. 
Examples for signaling ATRAC Advanced Lossless decoding dependency 
are described in Sections 7.8 and 7.9. 


Usage of Declarative SDP 


In declarative usage, like SDP in Real-Time Streaming Protocol (RTSP) 
[15] or Session Announcement Protocol (SAP) [16], the parameters MUST 
be interpreted as follows: 


o 


The payload format configuration parameters (baselayer, 
sampleRate, channelID) are all declarative and a participant MUST 
use the configuration(s) provided for the session. More than one 
configuration may be provided if necessary by declaring multiple 
RTP payload types; however, the number of types SHOULD be kept 
small. 


Any "maxptime" and "ptime" values SHOULD be selected with care to 
ensure that the session's participants can achieve reasonable 
performance. 


The attribute of "mid", "group", and "depend" MUST be used for 
indicating the relationship and dependency of the base layer and 
the enhancement layer in scalable multi-session streaming of ATRAC 
ADVANCED LOSSLESS content, as described in Sections 7.6, 7.8, and 
A 


Example SDP Session Descriptions 


Example usage of ATRAC-X with stereo at 44100 Hz: 


v=0 
o=atrac 2465317890 2465317890 IN IP4 service.example.com 


s= 


ATRAC-X Streaming 


c=IN IP4 192.0.2.1/127 


t= 


3409539540 3409543140 


m=audio 49120 RTP/AVP 99 

a=rtpmap:99 ATRAC-X/44100/2 

a=fmtp:99 baseLayer=128; channelID=2; delayMode=2 
a=maxptime: 47 
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Example usage of ATRAC-X with 5.1 setup at 48000 Hz: 


v=0 

o=atrac 2465317890 2465317890 IN IP4 service.example.com 
s=ATRAC-X 5.1ch Streaming 

c=IN IP4 192.0.2.1/127 

t=3409539540 3409543140 

m=audio 49120 RTP/AVP 99 

a=rtpmap:99 ATRAC-X/48000/6 

a=fmtp:99 baseLayer=320; channelID=5 

a=maxptime: 43 


Example usage of ATRAC-Advanced-Lossless in multiplexed 
High-Speed Transfer mode: 


v=0 

o=atrac 2465317890 2465317890 IN IP4 service.example.com 
s=AAL Multiplexed Streaming 

c=IN IP4 192.0.2.1/127 

t=3409539540 3409543140 

m=audio 49200 RTP/AVP 96 

a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2 

a=fmtp:96 baseLayer=128; blockLength=2048; channelID=2 
a=maxptime: 47 


Example usage of ATRAC-Advanced-Lossless in multi-session High-Speed 
Transfer mode. In this case, the base layer and the enhancement 
layer stream are identified by L1 and L2, respectively, and L2 
depends on L1 in decoding. 


v=0 

o=atrac 2465317890 2465317890 IN IP4 service.example.com 
s=AAL Multi Session Streaming 

c=IN IP4 192.0.2.1/127 

t=3409539540 3409543140 

a=group:DDP L1 L2 

m=audio 49200 RTP/AVP 96 

a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2 

a=fmtp:96 baseLayer=128; blockLength=2048; channelID=2 
a=maxptime: 47 

a=mid:L1 

m=audio 49202 RTP/AVP 97 

a=rtpmap:97 ATRAC-ADVANCED-LOSSLESS/44100/2 

a=fmtp:97 baseLayer=0; blockLength=2048; channelID=2 
a=maxptime:47 

a=mid:L2 

a=depend:97 lay L1:96 
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Example usage of ATRAC-Advanced-Lossless in Standard mode: 


m=audio 49200 RTP/AVP 99 

a=rtpmap:99 ATRAC-ADVANCED-LOSSLESS/44100/2 
a=fmtp:99 baseLayer=0; blockLength=1024; channelID=2 
a=maxptime:24 


Example Offer/Answer Exchange 


The following Offer/Answer example shows how a desire to stream 
multi-channel content is turned down by the receiver, who answers 
with only the ability to receive stereo content: 


Offer: 


m=audio 49170 RTP/AVP 98 99 
a=rtpmap:98 ATRAC-X/44100/6 
a=fmtp:98 baseLayer=320; channelID=5 
a=rtpmap:99 ATRAC-X/44100/2 
a=fmtp:99 baseLayer=160; channelID=2 


Answer: 


m=audio 49170 RTP/AVP 99 
a=rtpmap:99 ATRAC-X/44100/2 
a=fmtp:99 baseLayer=160; channelID=2 


The following Offer/Answer example shows the receiver answering with 
a selection of supported parameters: 


Offer: 


m=audio 49170 RTP/AVP 97 98 99 
a=rtpmap:97 ATRAC-X/44100/2 
a=fmtp:97 baseLayer=128; channelID=2 
a=rtpmap:98 ATRAC-X/44100/6 
a=fmtp:98 baseLayer=128; channelID=5 
a=rtpmap:99 ATRAC-X/48000/6 
a=fmtp:99 baseLayer=320; channelID=5 


Answer: 


m=audio 49170 RTP/AVP 97 98 
a=rtpmap:97 ATRAC-X/44100/2 
a=fmtp:97 baseLayer=128; channelID=2 
a=rtpmap:98 ATRAC-X/44100/6 
a=fmtp:98 baseLayer=128; channelID=5 
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The following Offer/Answer example shows an exchange in trying to 
resolve using ATRAC-Advanced-Lossless. The offer contains three 
options: multi-session High-Speed Transfer mode, multiplexed High- 


Speed Transfer mode, and Standard mode. 


Offer: 


Multi-session High-Speed Transfer mode, L1 and L2 correspond 
to the base layer and the enhancement layer, respectively, 


depends on L1 in decoding. 


a=group:DDP L1 L2 
m=audio 49200 RTP/AVP 96 
a=rtpmap:96 ATRAC-ADVANCED-LOSSLESS/44100/2 


a=fmtp:96 baseLayer=132; blockLength=1024; channelID=2 


a=maxptime:24 
a=mid:L1 


m=audio 49202 RTP/AVP 97 

a=rtpmap:97 ATRAC-ADVANCED-LOSSLESS/44100/2 
a=fmtp:97 baseLayer=0; blockLength=2048; channelID=2 
a=maxptime:24 

a=mid:L2 

a=depend:97 lay L1:96 


Multiplexed High-Speed Transfer mode 
m=audio 49200 RTP/AVP 98 
a=rtpmap:98 ATRAC-ADVANCED-LOSSLESS/44100/2 


a=fmtp:98 baseLayer=256; blockLength=2048; channelID=2 


a=maxptime:47 


Standard mode 

m=audio 49200 RTP/AVP 99 

a=rtpmap:99 ATRAC-ADVANCED-LOSSLESS/44100/2 
a=fmtp:99 baseLayer=0; blockLength=2048; channelID=2 
a=maxptime: 47 


Answer: 
a=group:DDP L1 L2 


m=audio 49200 RTP/AVP 94 
a=rtpmap:94 ATRAC-ADVANCED-LOSSLESS/44100/2 


a=fmtp:94 baseLayer=132; blockLength=1024; channelID=2 


a=maxptime:24 
a=mid:L1 
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m=audio 49202 RTP/AVP 95 

a=rtpmap:95 ATRAC-ADVANCED-LOSSLESS/44100/2 
a=fmtp:95 baseLayer=0; blockLength=2048; channelID=2 
a=maxptime:24 

a=mid:L2 

a=depend:95 lay L1:94 


Note that the names of payload format (encoding) and Media subtypes 
are case-insensitive in both places. Similarly, parameter names are 
case-insensitive both in Media types and in the default mapping to 
the SDP a=fmtp attribute. 


IANA Considerations 


Three new Media subtypes, audio/ATRAC3, audio/ATRAC-X, and 
audio/ATRAC-ADVANCED-LOSSLESS, have been registered (see Section 7). 


Security Considerations 


The payload format as described in this document is subject to the 
security considerations defined in RFC 3550 [1] and any applicable 
profile, for example, RFC 3551 [3]. Also, the security of Media type 
registration MUST be taken into account as described in Section 5 of 
RFC 4855 [6]. 


The payload for ATRAC family consists solely of compressed audio data 
to be decoded and presented as sound, and the standard specifications 
of ATRAC3, ATRAC-X, and ATRAC Advanced Lossless [9] [10] [11] 
strictly define the bit stream syntax and the buffer model in decoder 
side for each codec. So they can not carry "active content" that 
could impose malicious side effects upon the receiver, and they do 
not cause any problem of illegal resource consumption in receiver 
side, as far as the bit streams are conforming to their standard 
specifications. 


This payload format does not implement any security mechanisms of its 
own. Confidentiality, integrity protection, and authentication have 

to be provided by a mechanism external to this payload format, e.g., 

SRTP RFC 3711 [13]. 


Considerations on Correct Decoding 
1. Verification of the Packets 
Verification of the received encoded audio packets MUST be performed 
so as to ensure correct decoding of the packets. As a most primitive 


implementation, the comparison of the packet size and payload length 
can be taken into account. If the UDP packet length is longer than 
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the RTP packet length, the packet can be accepted, but the extra 
bytes MUST be ignored. In case of receiving a shorter UDP packet or 
improperly encoded packets, the packets MUST be discarded. 


10.2. Validity Checking of the Packets 


Also, validity checking of the received audio packets MUST be 
performed. It can be carried out by the decoding process, as the 
ATRAC format is designed so that the validity of data frames can be 
determined by decoding the algorithm. The required decoder response 
to a malformed frame is to discard the malformed data and conceal the 
errors in the audio output until a valid frame is detected and 
decoded. This is expected to prevent crashes and other abnormal 
decoder behavior in response to errors or attacks. 
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